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Abstract 

Take a large bag of black and white beans, with all possible proportions considered 
initially equally likely, and imagine to make random extractions with reintroduction. 
Twenty consecutive observations of black make us highly confident that the next bean 
will be black too. On the contrary, the observation of 1010 black beans and 990 white 
ones leads us to judge the two possible outcomes about equally probable. According to 
OS. Peirce this reasoning violates what he called "rule of balancing reasons" , because 
the difference of "arguments" in favor and against the outcome of black is 20 in both 
cases. Why? (I.e. why does that rule not apply here?) 

Introduction 

us take the following example from C.S. Peirce's The probability of induction[l\: 

"Suppose we have a large bag of beans from which one has been secretly taken at 
random and hidden under a thimble. We are now to form a probable judgement 
of the color of that bean, by drawing others singly from the bag and looking 
at them, each one to be thrown back, and the whole well mixed up after each 
drawing. 

Suppose that the first bean which we drew from our bag were black. That 
would constitute and argument, no matter how slender, that the bean under 
the thimble was also black. If the second bean were also to turn out black, that 
would be a second independent argument reenforcing the first. If the whole 
of the hrst twenty beans drawn should prove black, our confidence that the 
hidden bean was black would justly attain considerable strength. But suppose 
the twenty-fits bean were to be white and that we were to go on drawing until 
we found that we had drawn 1,010 black beans and 990 white ones. We would 
conclude that our first twenty beans being black was simply an extraordinary 
accident, and that in fact the proportion of white beans to black was sensible 
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equal, and that it was an even chance that the hidden bean was black. Yet 
according to the rule of balancing reasons, since all the drawings of black beans 
are so many independent arguments in favor of the one under the thimble being 
black, and all the white drawings so many against it, an excess of twenty black 
beans ought to produce the same degree of belief that the hidden bean was 
black, whatever the total number drawn." pQ 

The philosopher does not try to resolve the manifest contradiction in the rest of the article 
and the question is then left to the reader as a kind of paradox of what he calls the "con- 
ceptualistic view of probability" (nowadays 'subjective probability'), although its solution 
is rather easy: his 'rule of balancing reasons'' does not apply to the first practical example 
he provides, because the 'arguments' are not independent. 



2 Which box? Which color? 

Let us think to a slight different problem. We have two boxes, B\ and B 2 , containing well 
known proportions p\ and P2 of white balls, respectively (the remaining one are black). If 
we make random extractions (Ei) with reintroduction, the probability of getting black (B) 
and white (W) balls are: 

(1) 
(2) 
(3) 
(4) 

where the symbol '|' stands for 'given', i.e. 'under the condition', whereas the ubiquitous 
'J' stands for the state of information under which probability values are assessed. 

If we take one of the boxes at random (hereafter B?) this could be equally likely B\ or B 2 
and then the probability of getting black or white will be the averages of the probabilities 
given the two box compositions. As soon as we start sampling the box content by extractions 
followed by reintroduction our opinion concerning the box composition is modified by the 
experimental information, and the probability of occurrence of white in the next extraction 
is modified too0 
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2.1 Probability of white/black from a box on unknown composition 

In general, if, after n observations, our beliefs in the two kinds of boxes are P{Bx \ E, B?,I) 
and P(B 2 \E,B-?,I), the probability that a next extraction gives white is given by 

P{W\E,B 7 ,I) = P{W\Bx,I)P(Bx\E,B 7 ,I) + P{W\B 2 ,I)P(B 2 \E,B 7 ,I) (5) 
= Pl P(px\E,B ? ,I)+p 2 P(p 2 \E,B ? ,I), (6) 

x If the amount of balls in the box is very large, thinking to the next extraction, or taking at random a 
ball at the very beginning and hiding it "under a thimble", is practically the same. 
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where E is the ensemble of all observations, i.e. E = {E\, E 2 , . . . , E n }. In Eq. ([6]) 
P(Bi I E, B-?,I) has been replaced by P(p% \E,B?, I) to remark that our belief in a given 
proportion is equal to the our belief in the corresponding box type. Note that, since 
P(pi \E, B?,I) + P{p 2 \E,B?,I) = 1, Eq. © can be read as weighted average. If, instead 
of just two possible box compositions, we have many, Eq. ^ becomes 

P(W\E,Bi,I) = J^PCp, I (7) 



2.2 Probability of the different box composition given the past observa- 
tions 

As far as the updating of probability is concerned, the most convenient way in the case of 
two hypotheses is to use the update of probability ratios (odds) via the Bayes factor. Since 
the events black or white are independent given a box compositions and using the notation 
of Ref. [2jl (sections 2.3 and 2.4), we can write 

1>2 (E,I) = 1>2 (E,I)x0 1>2 (I), (8) 

where the priors odds 0\ 2 {T) are unitary in this case ('even'), while the overall Bayes factor 
is 



d lt2 (E,I) = l[d 1>2 (E k ,I), (9) 

with 



k=i 



12 (E,D = P Jp^hI\ (10) 
l ' 2y } P( P2 \E,B ? ,I) { ' 

0l * 2(i?fc ' 7) " P{E k \p 2 J)> 

where the Bayes factors due to each piece of evidence are written as Oi^^E^, I) to remark 
that they would be the odds only considering the individual piece of evidence E k , provided 
the two hypotheses were otherwise considered equally likely. 



2.2.1 Logarithmic update and weight of evidence 

The update rule ([8]) can be turned into an additive rule if, as first (as far as I know) proposed 
by Peirce in the same paper of the bag of beans example, we take the logarithm of it. Using 
the notation of Ref. [2] , we can rewrite Eq. ([8]) as 

JLi.aOE,/) = AJL 1i2 (£,/) + JL 1i2 (/), (12) 

2 This paper is strictly related to Ref. [2], because I discovered Peirce's The Probability of Induction 
making a short historical research on the use of the logarithmic updating of odds (see Appendix E there). 
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with 



n 

AJLi, 2 (£,r) = ^AJL lj2 (^,J), (13) 

k=l 

where the JL's and the AJL's are base 10 logarithms of odds and of Bayes factors, re- 
spectively. The JL's (judgement leaning) correspond to Peirce's intensities of belief, their 
variation (AJL) being due to the weight of evidence (see Ref. [T] and Appendix E of Ref. [2]). 

The contributions AJLi i2 (£fc, I) can be positive or negative, depending if the corre- 
sponding Bayes factors are larger or smaller than one, and they are considered by Peirce 
as arguments in favor or against the hypothesis 1 (B\, or pi, here): 

"The rule of the combination of independent arguments takes a very simple form 
when expressed in terms of the intensity of belief, measured in the proposed way. 
It is this: Take the sum of all the feelings of belief which would be produced 
separately by all the arguments pro, subtract from that the similar sum for 
arguments con, and the remainder is the feeling of belief which we ought to 
have on the whole. This is a proceeding which men often resort to, under the 
name of balancing reasons. " pQ 

At this point we only need to write down the weights of evidence due to the observation of 
the different colors: 

AJLi, 2 (E fc = W,J) = log 10 ^ (14) 

P2 

AJL li2 (£ fe = B,J) = log 10 l^. (15) 

1 — P2 

As we we see, the absolute weight of 'arguments' depends on the values of p\ and p 2 . If they 
are very similar, the indication provided by the experimental information is very week and 
we need a very large number of observations to discriminate between the two hypotheses 
(see e.g. Appendix G of Ref. [2j). If, instead, the proportion of one kind of balls is very 
close to zero or to 1, the indications can be rather strong and just one or a few extractions 
make us highly confident about the box composition. At the limit, if one of the box only 
contains white or black balls, a single observation showing the opposite colors is enough to 
rule out that hypothesis (|AJLi )2 (^, J)| = oo). 

2.2.2 Combined weight of evidence and final odds after a sequence of extrac- 
tions 

Since the weights of evidence due to independent pieces of evidence sum up, after the 
observation of n\y white and n B black we have 

AJLi ,2{nw,n B ,I) = n w log 10 — + n B log 10 - — — , (16) 

P2 1 - P2 
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or, since we started from uniform priors, 



lt2 (n w ,n B) I) = (^) (^rf* (17) 

(18) 
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The probabilities of the two box compositions are then 



p? w (l- Pi ) nB 

pI w {i-pi) n B + P ^ W {i-p 2 ) nB 



P(jPi\n w ,n B ,I) = -^rjz ' , nn - J w r, T^7- ( 19 ) 



This formula can easily extended to the case of many box composition: 

n w q _ v .)n B 

PMnw,n B ,I) = ^ (1 Z^.. . (20) 
2.3 Case with two symmetric bag compositions 

A particular case, that can be useful to clarify the difference with respect to the different 
problem discussed in the following section, is when p 2 = 1 — pi, for example p\ = 1/4 and 
pi = 3/4. In this case we have 

AJL lj2 (^=W,I) = log 10 -^— (21) 

1 -Pi 

AJL 1)2 (£ fc = B,I) = log 10 i^i (22) 

Pi 

i.e. 

A3L 1;2 (E k = BJ) = -AJL 1)2 (E k = W,I) (23) 

the weights of evidence provided by black and white have opposite sign but are equally 
in module. It follows that our judgement in favor of the two boxes depends only on the 
difference of black and white balls observed, but not on the number of extractions. Here 
then the rule of balancing reasons applies. 

3 From many to (virtually) infinite box compositions 

In the limit that the number of possible compositions is virtually infinite, the parameter p 
that gives the white ball proportion becomes continuous and the problem is solved in terms 
of probability density function f(p \ E, I). Essentially Eq. (|20p becomes 

n n w (-i _ \n B 

f(p\n w ,n B J) = / P > , (24) 

JqP w (1 -p) nB dp 
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Figure 1: Probability density function of the white ball proportion p, having observed 
exactly 50% white balls in 2, 4, 10, 20, 50 and 100 extractions (the curves become in order 
narrower). The horizontal line represents the uniform prior. 



Some examples of f(p \ nw, kb, -0 are given in figure [I] with for several numbers of extrac- 
tions, assuming that in all cases the fraction of white balls has been 50%. We see that with 
the increasing number of extractions we get more and more confident that p is around 0.50 

3 The uncertainty, measured by the standard deviation of the distribution a(p), is given by 



with E[p] equal to the expected value, given by Eq. (|28[) . 

Note that cr(p) is the uncertainty about the proportion of white balls in the box and not about the probability 
of having white in the next extraction, which is exactly 1/2 in all cases in which an equal number of black 
and white balls has been observed! It seems that this point was not very clear to Peirce, who writes in 
Ref. [T], also referring to the large "bag of bean", after the first period of the quote reported in page 1: 

"Suppose the Erst drawing is white and the next black. We conclude that there is not an 
immense preponderance of either color, and that there is something like an even chance that 
the bean under the thimble is black. But this judgement might be altered by the next few 
drawings. When we have drawn ten times, if 4, 5, or 6, are white, we have more confidence that 
the chance /that the bean under the thimble is black, we have to understand] is even. When 
we have drown a thousand times, if about half have been white, we have great confidence in 
this result." [I] 

To be more precise, there are several things that should kept separate in our reasonings: 

• The proportion of white balls in the box, that is p, our uncertainty concerning it being described by 



that gives 




(25) 
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The most probable values of p are those around nw /(ra^+ns), although the probability 
to have a white ball in a future observation has to take into account all possible compositions 
in a manner similar to that seen in Eq. ([7]), i.e. 



P(W\E,B ? ,I) = [ pf(p\E,B?,I)dp. 
Jo 



(26) 



In the r.h.s. of (I26p we recognize the expected value of p, 'barycenter' of the probability 
density function. Therefore it follows that 

P(W\E,B 7 ,I) = E[p\E,B ? ,I]= f 1 pf{p\E,B 1 J)dp. (27) 

Jo 

The result of the integral (f2T)j) is 

f 1 pf(p\E,B,J)dp = n Z + \o i < 28 ) 
Jo n w + n B + 2 

thus leading to the famous (although often misused!) Laplace rule of successions 



and, by symmetry, 



P(W\E,B,,I)= UW + 1 (29) 
V 1 1 n w + n B + 2 y ' 



P(B\E,B 7 ,I)= HB + 1 (30) 
nw + riB + 2 



the probability density function (|25[l . with expected value given by Eq. (1281) and 'standard uncertainty' 
o~(p) given at the beginning of this footnote (results valid from a uniform prior). 

The relative frequency of white balls that we expect in a series of m extraction, given the past 
observation of nw and ub- The expression that gives our beliefs on all possible (in number m + 1) 
values of the relative frequency is quite complicate and can be found in section 7.3 of Ref. [3]- The 
case in which m tends to infinite is instead rather easy to understand, since, calling ip m the possible 
value of the relative frequency in m extractions, we have, under the assumption that p is perfectly 
known, 

E[ip m | p, I) = p 



r i ri Vp(! -P) 

a[ipm\P,I] = - 



That is, in the limit m — > oo, we feel practically sure to observe a value of tpoo equal to p ('Bernoulli 
theorem'). If we are, instead, uncertain about p, then we are uncertain about ip^ exactly in the same 
way and the probability density function f(<fioo \ nw,nB, I) has the same shape of f(p \ nw, nn,I)- 
The probability that the next outcome will be white, the evaluation of which has to take in consid- 
eration all possible values of p, each weighted by how much we believe it (to be precise, since the 
proportion is virtually a real number, the beliefs concern small intervals of p). But this is exactly the 
expected value of p, according to the relation (|27p . In the particular case of absolute symmetry of our 
observations and of our prior, there should be not the slightest rational preference in favor of either 
color [P(W | nw = ns,B?, I) = 1/2], no matter if we are very uncertain about box composition or 
future relative frequencies. 
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Using Peirce numbers, we get 



P(B 1 1010B+990W,5 ? ,J) 



P(B 1 20B, £??,!) 




(31) 



(32) 



providing quite different degrees of belief, as it is intuitive and as it was clear to Peirce, 
who, by the way, uses Laplace rule of successions several times in Ref. pQ. 

4 Weights of evidence in favor of a black or white bean under 
the thimble 

We have now all the tools to analyze Peirce's bag of bean example, in which the 'arguments' 
did not regard the box composition, but the occurrence of a white or black bean in a future 
extraction (the fact that the bean was extracted at the beginning is irrelevant, as it has 
already been observed). 

Since the bag is 'large' the proportion of white beans can be considered as a real number 
p ranging between and 1. Moreover, as implicitly assumed by Peirce (but this specific 
assumption is not strictly needed for the main conclusions of the paper), we judge that the 
value of p could lie with equal probability in any small interval in the range between and 
1 ('uniform prior'). Therefore we can use the results obtained in the previous section. 

Let us now calculate the weight of evidence provided by the observation of black or 
white in favor of the occurrence of black or white. We need to calculate the Bayes factor 
that changes the odd ratio P(W | E, B?, I)/P(B \ E, B?,I) if we add a further observation 
E n+ i, schematically 



This updating factor cannot be calculated directly in an easy way, but it can be nevertheless 
valuated indirectly by its definition ('final odds divided initial odds'). In fact, the evaluation 
of initial and final odds is very simple, just applying Laplace's rule. For the former we have 



P{W\E,B?,I) 
P(B\E,B ? ,I) 



P(W\E,E n+1 ,B ? ,I) 
P{B\E,E n+1 ,B 7 ,I) " 



(33) 




(34) 



The observation of a new white or of a new black changes this ratio in 



Ow,B{nw + l,n B J) 



O w ,B(nw,n B + 1,1) 



(n w + 1) + 1 
n B + l 
n w + 1 



(35) 



(36) 



{n B + 1) + 1 ' 
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respectively. Dividing Eqs. (f35j) and (f36j) by (JM]) we get the updating factors of interest: 



d w ,B(W,n w ,n B ,I) = HW + , (37) 

dw,B(B,n w ,n B ,I) = UB + \ - (38) 

riB + £ 

Contrary to other usual Bayes factors, they depend on the previous amount of like-color 
balls already observed. [We can easily understand that they also depend on the priors on 
the box composition, and therefore Eqs. fj3Tj) - (|38 j) are only valid for a uniform prior.] 

Let us apply these formulae to Peirce's example. The first observation of black yields 
an updating factor of 1/2, or AJL of —0.30; the second 2/3, or AJL = —0.18; the third 
3/4, or AJL = —0.12; and so on. The updating factor produced by the 20th observation 
is only 20/21=0.952, which corresponds to the little weight of evidence AJL = —0.02. The 
overall factor is 1/21=0.048 (AJL = —1.32), which is also equal to the final odds (the two 
hypotheses were considered initially equally likely), from which a probability of 4.5% for 
white and 95.5% for black can be calculated. 

If the 21-st extraction results, instead, in white, the new updating factor is 2 (AJL = 
0.30), changing sizeable the overall updating factor, that becomes then 2/21, or AJL = 
—1.02, thus almost doubling the probability of white, that becomes then 8.7%. This is very 
interesting: the first time either color occurs it changes the odds by a factor of two in favor 
of that color (|AJL = 0.30|). A second observation of white gives an updating factor of 4/3 
(AJL = 0.12) and so on. 

If we observe 1010 black and 990 white balls, the updating factor can be divided 
into the product of a factor given to white beans and a factor given to black ones, i.e. 
01,2(990^,1010715,7) = d h2 (990n w , I) x 6i )2 (1010n B ,I), with 

~ . . 12 3 1010 , . 

CWlOlOne,/) = -x-x-x---x (39) 

■ V ; 2 3 4 1011 v ; 

3 4 991 
Oi j2 (990n w , I) = 2X-X-X---X— . (40) 

In the product the first 990 factors of Oi )2 (1010ns, I) are simplified by Oi i2 (990nw, I) and 
the final result is 

~ , . 991 992 1010 991 

012(990^,1010^,1) = x x---x = : (41) 

' y w ' ' ' 992 993 1011 1011 v ' 

the 20 residual 'arguments' in favor of black are not the early ones (in which case the result 
would be equal to the observation of twenty black in a row) but the late ones, individually 
very small (AJL about —0.0004 each). All together they provide a negligible weight of 
evidence in favor of black, with a combined AJL of —0.0087, and the result of Eq. ([32]) is 
reobtained. 
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5 Conclusions 



This example shows the danger of drawing quantitative conclusions from qualitative, in- 
tuitive considerations (an issue extensively discussed in Ref. [2]). Yes, each observation of 
black is 'an argument' in favor of the opinion that the bean 'under the thimble' is black. 
But the arguments do have the same strength and then the final 'intensity of belief (to use 
a very interesting expression by Peirce) does not depend simply on the difference of their 
numbers in favor of either color. 
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